Fast Conjugate Gradients with Multiple GPUs

نویسندگان

  • Ali Cevahir
  • Akira Nukada
  • Satoshi Matsuoka
چکیده

The limiting factor for efficiency of sparse linear solvers is the memory bandwidth. In this work, we describe a fast Conjugate Gradient solver for unstructured problems, which runs on multiple GPUs installed on a single mainboard. The solver achieves double precision accuracy with single precision GPUs, using a mixed precision iterative refinement algorithm. To achieve high computation speed, we propose a fast sparse matrix-vector multiplication algorithm, which is the core operation of iterative solvers. The proposed multiplication algorithm efficiently utilizes GPU resources via caching, coalesced memory accesses and load balance between running threads. Experiments on wide range of matrices show that our matrix-vector multiplication algorithm achieves up to 11.6 Gflops on single GeForce 8800 GTS card and CG implementation achieves up to 24.6 Gflops with four GPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conjugate Gradients on Multiple GPUs

A GPU accelerated Conjugate Gradient solver is tested on eight different matrices with different structural and numerical characteristics. The first four matrices are obtained by discretizing the 3D Poisson’s equation, which arises in many fields such as computational fluid dynamics, heat transfer and so on. Their relatively low bandwidth and low condition numbers makes them ideal targets for G...

متن کامل

Multi-GPU Implementation for Iterative MR Image Reconstruction with Field Correction

INTRODUCTION Many advanced MRI image acquisition and reconstruction methods see limited application due to high computational cost in MRI. For instance, iterative reconstruction algorithms (e.g. non-Cartesian k-space trajectory, or magnetic field inhomogeneity compensation) can improve image quality but suffer from low reconstruction speed. General-purpose computing on graphics processing units...

متن کامل

Conjugate Gradients on Graphic Hardware: Performance & Feasibility

The Conjugate Gradient method (CG), one of the most commonly used iterative methods for solving very large systems of equations, has a history of running at less than 10% of peak processor performance, because its memory bounded nature and irregular access patterns. Due to their low cost and very large bandwidth, one solution that becomes more and more attractive is using GPUs as accelerators. ...

متن کامل

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...

متن کامل

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel R Xeon Phi TM to current Kepler-based NVIDIA R Tesla TM GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009